attention mechanisms

The glossary is being gradually proof checked, but currently has many typos and misspellings.

Attention mechanisms are used in machine learning, particularly for text and in large language models and other forms of deep learning. The concept borrows from human attention, which is critical for cognition.

Given a sequence of tokens (e.g. words in a text), simple window-based methods would treat all past tokens in the window as equally relevant for predicting or otherwise learning based on a current token. In contrast attention mechanisms attempt to work out past tokens that are especially relevant and gives these higher weight (attention) during training. This may be achieved by creating an interest vector for each input (as part of the deep learning process) that in some way represents its topic area and then matching the interest vector of past tokens with the current token. Some attention mechanisms will also retain past tokens that have proved especially relevant based on their relevance to later tokens, thus allowing memory beyond the window. Salience can be combined with attention mechanisms by rating the unusuallness of tokens and token sub-seqences and use these to weight tokens together with similarity to the current token.

Used in Chap. 14: page 214; Chap. 24: page 376